Complete PyTorch Tutorial

1. Introduction to PyTorch

PyTorch is an open-source deep learning framework developed by Meta (Facebook). It is widely used in research and industry due to its flexibility and ease of debugging.

Key Characteristics:

Dynamic computation graph (define-by-run)
Pythonic and intuitive API
Native GPU (CUDA) support
Strong ecosystem (TorchVision, TorchText, TorchAudio)

2. Installation & Setup

PyTorch can be installed using pip. It automatically detects CPU or GPU support.


pip install torch torchvision torchaudio


import torch
print(torch.__version__)

3. Tensors (Core Data Structure)

A tensor is a multi-dimensional array, similar to NumPy arrays, but with GPU acceleration and automatic differentiation support.


x = torch.tensor([1, 2, 3])
y = torch.rand(2, 3)
z = torch.zeros(3, 3)

Tensors support mathematical operations and broadcasting.

4. Autograd – Automatic Differentiation

Autograd is PyTorch’s automatic differentiation engine. It tracks operations on tensors and computes gradients during backpropagation.


x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
y.backward()
print(x.grad)

This mechanism forms the backbone of neural network training.

5. Neural Networks and nn.Module

All neural networks in PyTorch are created by subclassing torch.nn.Module.


import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

The forward() method defines the computation flow.

6. Loss Functions and Optimizers

Loss functions measure model error, while optimizers update model weights.


criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Common losses include MSE, CrossEntropy, and BCE.

7. Training Loop (Most Important Concept)

The training loop performs forward pass, loss computation, backpropagation, and parameter updates.


for epoch in range(100):
    optimizer.zero_grad()
    outputs = model(x_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

8. Datasets and DataLoaders

DataLoaders handle batching, shuffling, and efficient data loading.


from torch.utils.data import DataLoader, TensorDataset

dataset = TensorDataset(x_train, y_train)
loader = DataLoader(dataset, batch_size=32, shuffle=True)

9. Model Evaluation & Inference

During evaluation, training-specific layers like Dropout must be disabled.


model.eval()
with torch.no_grad():
    predictions = model(x_test)

10. Regularization Techniques

Regularization prevents overfitting and improves generalization.

Dropout


nn.Dropout(p=0.5)

L2 Regularization


optimizer = torch.optim.Adam(
    model.parameters(),
    lr=0.001,
    weight_decay=1e-4
)

11. Weight Initialization

Proper initialization prevents vanishing and exploding gradients.


nn.init.kaiming_normal_(model.fc.weight)

12. Learning Rate Scheduling

Schedulers dynamically adjust learning rate to improve convergence.


scheduler = torch.optim.lr_scheduler.StepLR(
    optimizer, step_size=10, gamma=0.1
)
scheduler.step()

13. Convolutional Neural Networks (CNNs)

CNNs are designed for image data using convolutional filters.


class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1, 32, 3)
        self.fc = nn.Linear(32 * 26 * 26, 10)

    def forward(self, x):
        x = torch.relu(self.conv(x))
        x = x.view(x.size(0), -1)
        return self.fc(x)

14. Transfer Learning

Transfer learning reuses pre-trained models to reduce training time and data requirements.


from torchvision import models
model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False

15. Gradient Clipping

Gradient clipping prevents exploding gradients in deep networks.


torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

16. Mixed Precision Training

Mixed precision improves training speed and reduces memory usage.


from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

17. Saving and Loading Models

Saving models allows reuse and deployment.


torch.save(model.state_dict(), "model.pth")
model.load_state_dict(torch.load("model.pth"))

18. Best Practices

Overfit on small data to debug
Check tensor shapes frequently
Use GPU only after CPU testing
Save checkpoints regularly

🎯 Conclusion: Mastering these PyTorch concepts enables you to build production-ready deep learning systems.